Improving Low-Resource Cross-lingual Parsing with Expected Statistic Regularization
نویسندگان
چکیده
Abstract We present Expected Statistic Regulariza tion (ESR), a novel regularization technique that utilizes low-order multi-task structural statistics to shape model distributions for semi- supervised learning on low-resource datasets. study ESR in the context of cross-lingual transfer syntactic analysis (POS tagging and labeled dependency parsing) several classes statistic functions bear behavior. Experimentally, we evaluate proposed with unsupervised 5 diverse target languages show all statistics, when estimated accurately, yield improvements both POS LAS, best improving by +7.0 LAS +8.5 average. also semi-supervised curve experiments provides significant gains over strong cross-lingual-transfer-plus-fine-tuning baselines modest amounts label data. These results indicate is promising complementary approach model-transfer approaches parsing.1
منابع مشابه
Cross-Lingual Dependency Parsing with Late Decoding for Truly Low-Resource Languages
In cross-lingual dependency annotation projection, information is often lost during transfer because of early decoding. We present an end-to-end graph-based neural network dependency parser that can be trained to reproduce matrices of edge scores, which can be directly projected across word alignments. We show that our approach to cross-lingual dependency parsing is not only simpler, but also a...
متن کاملLow Resource Dependency Parsing: Cross-lingual Parameter Sharing in a Neural Network Parser
Training a high-accuracy dependency parser requires a large treebank. However, these are costly and time-consuming to build. We propose a learning method that needs less data, based on the observation that there are underlying shared structures across languages. We exploit cues from a different source language in order to guide the learning process. Our model saves at least half of the annotati...
متن کاملCross-lingual RST Discourse Parsing
Discourse parsing is an integral part of understanding information flow and argumentative structure in documents. Most previous research has focused on inducing and evaluating models from the English RST Discourse Treebank. However, discourse treebanks for other languages exist, including Spanish, German, Basque, Dutch and Brazilian Portuguese. The treebanks share the same underlying linguistic...
متن کاملCross-Lingual Word Embeddings for Low-Resource Language Modeling
Most languages have no established writing system and minimal written records. However, textual data is essential for natural language processing, and particularly important for training language models to support speech recognition. Even in cases where text data is missing, there are some languages for which bilingual lexicons are available, since creating lexicons is a fundamental task of doc...
متن کاملUnsupervised Ranked Cross-Lingual Lexical Substitution for Low-Resource Languages
We propose an unsupervised system for a variant of cross-lingual lexical substitution (CLLS) to be used in a reading scenario in computer-assisted language learning (CALL), in which single-word translations provided by a dictionary are ranked according to their appropriateness in context. In contrast to most alternative systems, ours does not rely on either parallel corpora or machine translati...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Transactions of the Association for Computational Linguistics
سال: 2023
ISSN: ['2307-387X']
DOI: https://doi.org/10.1162/tacl_a_00537